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The thermodynamics of protein aggregation reactions may underpin the 
enhanced metabolic efficiency associated with heterosis, some balancing 
selection, and the evolution of ploidy levels 

B.R. Ginn' 


Abstract 


Identifying the physical basis of heterosis (or “hybrid vigor”) has remained elusive despite over a hundred years 
of research on the subject. The three main theories of heterosis are dominance theory, overdominance theory, and 


epistasis theory. Kacser and Bums (19811 identified the molecular basis of dominance, which has greatly enhanced 
our understanding of its importance to heterosis. This paper aims to explain how overdominance, and some features 
of epistasis, can similarly emerge from the molecular dynamics of proteins. Possessing multiple alleles at a gene 
locus results in the synthesis of different allozymes at reduced concentrations. This in turn reduces the rate at which 
each allozyme forms soluble oligomers, which are toxic and must be degraded, because allozymes co-aggregate 
at low efficiencies. The model developed in this paper will be used to explain how heterozygosity can impact the 
metabolic efficiency of an organism. It can also explain why the viabilities of some inbred lines seem to decline 
rapidly at high inbreeding coefficients (F > 0.5), which may provide a physical basis for truncation selection for 
heterozygosity. Finally, the model has implications for the ploidy level of organisms. It can explain why polyploids 
are frequently found in environments where severe physical stresses promote the formation of soluble oligomers. The 
model can also explain why complex organisms, which need to synthesize aggregation-prone proteins that contain 
intrinsically unstructured regions (lURs) and multiple domains because they facilitate complex protein interaction 
networks (PINs), tend to be diploid while haploidy tends to be restricted to relatively simple organisms. 
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1. Introduction 

Heterosis, or “hybrid vigor”, refers to the superior 
performance of highly heterozygous individuals relative 
to less heterozygous individuals on a number of biolog¬ 
ical metrics (Lippm an and Zamir|2007 Hochholdinger] 
and Hoecker 2007 Birchler et al. 2010| l. Biologists 
have been aware of the phenomena for over a hundred 
years (e.g. Darwin 1876| l, and it has been exploited 
to improve crop yields substantially over the twentieth 
century, especially in maize ( |Crow| 1 998) |Duvick|200 1 [ ), 
but there is still debate over its origin (for review see 


Lippman and Zamir 

2007[ Hochholdinger and Hoecker 

20071 Birchler et a 

2010|l. Three different theories 


are usually used to explain heterosis: dominance the¬ 
ory ( Davenportll 1908 Bm^ 1910 JJo^ 1917| l, over¬ 


dominance theory (|Shull|1948 [EastllOSh i, and epista- 


(Powers|[l944[ Williams||1959 1. Dominance theory 
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attributes the benefits of heterozygous genotypes to the 
masking of recessive deleterious alleles. Proponents of 
overdominance theory argue that heterozygosity itself 
can have benefits, even in the absence of deleterious 
mutations. Proponents of epistasis argue that heterosis 
comes from positive interactions between multiple gene 
loci. Currently, dominance theory is the most widely 
accepted theory of heterosis (|Crow|1998[ |Charlesworth 
land Willis|2009] l. 

However, dominance theory is unlikely to be the sole 
explanation for heterosis. Five lines of evidence from 
the literature for overdominance and epistasis are briefly 
mentioned here. First, the performance of hybrid rice 
cannot be explained solely by dominance theory, and 
may require some combination of overdominant and 
epistatic interactions ( Li et al.|2001t Zhou et al.|2012| l. 
Second, breeding experiments performed on polyploid 
plants appear to indicate that possessing three or more 
alleles at a gene locus is more beneficial than possess¬ 
ing two, which is difficult to explain with dominance 
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theory alone (Groose et al. 198^ Riddle and Birch- 
ler [2008 |l. Third, the rate of heterozygosity decline 


that occurs after multiple generations of inbreeding is 


slower than predicted by dominance theory (Rumball 


let al.|[T994l |Demontis et al.||2009|Chelo and Teotonio 


2012^. The slow decline may reflect balancing selection 


(selection for overdominant loci) or linkage between the 
measured genetic markers and deleterious alleles (asso 
dative overdominance). Fourth, haplodiploid (Henter 


2003 Tortajada et al. 20091 and selling species (Hus 


band and Schemske||1996|l show greater degrees of in 


breeding depression than would be expected if delete¬ 
rious recessive alleles alone were responsible (although 
numerous very mildly deleterious alleles may explain 
these findings). Finally, several authors have found ev¬ 
idence of heterozygosity-fitness correlations (HFC’s) in 
wild populations, which are taken as evidence of over¬ 
dominance (Lesica and Allendorf) 1992 Silva et al] 


120061 IFerreira and Amos 20061 Makinen et al. 2008 
IHoffman et al.|2010| l. 

One of the appeals of the dominance theory of het¬ 
erosis is that its mechanisms have firm theoretical foun¬ 
dations. Population genetics theory predicts that reces¬ 
sive deleterious mutations should be common in pop¬ 
ulations (|Charlesworth and Willis |2009|l. Furthermore, 


Kacser and Burns 


( 1981|l presented theoretical and ex¬ 


perimental results that show why deleterious mutations 
are frequently recessive. To date, there is no widely ac¬ 
cepted mechanism for overdominance and selection for 


heterozygous genotypes (Charlesworth and Willis 2009 


Zhou et al.|2012|l. This paper attempts to provide such : 


mechanism, which will provide an explanation for how 
heterozygosity can result in heterosis even in the ab¬ 
sence of deleterious mutations. An interesting feature 
of the theory presented in this paper is that overdomi¬ 
nant loci are inherently epistatic, which may be a rea¬ 
son why papers that find evidence for overdominance 
also tend to find evidence for epistasis (|Li et al.||2001 
IZhou et al.|20T2l l. 

The central hypothesis of this paper, that the speci¬ 
ficity of protein aggregation reactions provide a physical 
basis for overdominance and heterozygous advantage, is 
supported by the findings of previous papers that relate 
heterozygosity to protein metabolism. The earliest pa¬ 
pers, such as|Koehn and Shumway (1982 1 and Hawkins 


et al. (19861, found that both metabolic efficiency and 


protein turnover increase with decreasing heterozygos¬ 


ity in marine bivalves. More recently, Kristensen et al. 


P002| l and |Pedersen et al.] ( |20051 l found that inbred lines 
of Drosophila melanogaster produce higher concentra¬ 
tions of molecular chaperones than outbred lines, which 
they hypothesized was due to higher rates of protein ag¬ 


gregation. Both Kristensen et al. ( 2002) and Goff ( |2011| l 
argued that these previous findings can be explained if 
homozygosity for deleterious mutations leads to greater 
expression of unstable proteins by inbred individuals. In 
contrast, Ginn ( 2010| l attempted to show that the previ¬ 
ous findings could be explained by an overdominance 
model if protein aggregation is assumed to be a highly 
specific process (see below). |Mead et al.| P003| l had 
already shown that the specificity of prion amyloid for¬ 
mation resulted in balancing selection at the prion pro¬ 
tein gene. However, Mead et al. ( |2003 i and Ginn ( |2010 1 
only considered the benefits of heterozygosity at a sin¬ 
gle gene locus. This paper will provide a biochemi¬ 
cal explanation for how heterozygosity at multiple gene 
loci can lead to truncation selection, which can maintain 
protein polymorphisms at numerous gene loci in natu¬ 
ral populations (|King| 19671 |Milkman| 19671 |Sved et al. 
[T9^ . 

One of the benefits of this paper’s theoretical ap¬ 
proach is that it can potentially explain several trends 
in the ploidy level of organisms. The reason why or¬ 
ganisms have different ploidy levels is still poorly un¬ 
derstood ( |Otto and Gerstein 20081 |Otto and Whitton] 


2000 Mable 


20041 Madlung 20131. Yet, the evolu¬ 


tion of different ploidy levels is important to any theory 
of heterosis since heterozygosity cannot exist in strictly 
haploid organisms. Most theories that attempt to ex¬ 
plain the evolution of higher ploidy levels have focused 
on the masking of deleterious recessive alleles or have 
compared the rates of evolution of organisms with dif¬ 
ferent ploidy levels (see Orr and Otto|1994 |0tto|2007 


and Otto and Gerstein|2008 for review). The theory de¬ 
veloped in this paper will instead focus on how higher 
ploidy levels can help organisms cope with protein ag¬ 
gregation. The theory can potentially explain four dif¬ 
ferent trends: 1) the frequent occurrence of polyploid 
organisms in harsh environments, 2) the restriction of 
haploidy to relatively simple organisms, 3) the rela¬ 
tive stress tolerances of plant gametophytes and sporo- 
phytes, and 4) why complexity (and diploidy) is associ¬ 
ated with the production of sexual spores in plants and 
fungi. Thus, the theory presented in this paper attempts 
to provide a complete theory of heterozygous advantage 
that unifies our understanding of heterosis, selection for 
heterozygosity, and ploidy level. 

2. Metabolic Heterosis 

2.7. Inbreeding and Metabolic Efficiency 

Numerous studies have been published on a phenom¬ 
ena that I will call ’’metabolic heterosis” in this paper. 
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Metabolic heterosis is the observed correlation between 
the growth rate of organisms and their heterozygos¬ 
ity as measured by allozyme or microsatellite markers 


(Koehn and Shumway[|1982| Garton et al. 1984 

Mit- 

ton|1985 [Danzmann et al.|1987[ Mitton|1993 H 

edge- 

cock et al. 1996 Pogson and Fevolden| 1998[ 

3ayne 

et al. 

1999 

Hawkins and Day |1999[ [Hawkins et al. 

2000 

|Bayne 

2004 Borrell et al. 2004 Pujolar et al. 

2005 

[Ketola and Kotiaho 2009 |l. Some of these authors 


maintain that their correlations indicate a relationship 


between heterozygosity and metabolic efficiency (Gar- 


ton et al.|1984 

Mitton 

1993| 

Mitton] 1997[ Pogson and 

Fevoldenb998 

Borrel 

et al. 

|2004|l. While most stud- 


ies have focused on the size and mass of the organism, 
metabolic efficiency can affect other fitness traits. For 
instance, [Gajardo and Beardmore ( 1989| l and |Gajardo| 


et al. I (2001 1 have shown a positive correlation between 


heterozygosity and the percentage of female Artemia 
that produce energetically expensive encysted offspring 
rather than energetically cheaper nauplii. While many 
studies have concluded that heterozygosity is correlated 
with the metabolic efficiency of organisms, there is still 
no consensus view on the underlying mechanism be¬ 
hind this correlation. Furthermore, there is an additional 
debate over whether the fitness parameters are corre¬ 
lated with the heterozygosity of the organism across all 
gene loci (the “general effect” hypothesis) or only with 
the heterozygosity of gene loci near the measured ge¬ 
netic markers (the “local effect” hypothesis) (Mitton and] 


|Pierce|19801|Balloux et al.|2004[[Szulkin et al.|2010l. 

The correlation between heterozygosity and 
metabolic efficiency may be explained by protein 
turnover ([Hawkins et al.|[T98^ [Hawkins et al.|[T989 


Hedgecock et al. 1996[ Bayne [2004[ l. Hawkins et al. 
([1986 1 showed that lower heterozygosity leads to 


higher levels of protein turnover in the blue mussel, 
Mytilus edulis, using labeled food and allozyme 
markers. Protein turnover refers to an organism’s daily 
degradation and synthesis of proteins, both of which 
are energy consuming processes. Therefore, these 
papers argued, an inbred organism’s biomass may be 
more energetically expensive to sustain than an outbred 
organism’s due to higher levels of protein turnover. 

One of the processes that enhances protein turnover is 
protein aggregation. Most proteins consist of polypep¬ 
tide chains that must fold into a native conformation in 
order to be functional. Many proteins, especially meta¬ 
stable proteins containing intrinsically unstructured re¬ 
gions (lURs), are continuously unfolding and refold¬ 
ing in an organism ( Olzscha et al.||20TT| also see Sec¬ 
tion below). However, as shown in Figure there 
are two alternative pathways that unfolded polypeptide 



Figure 1: An unfolded polypeptide chain may either refold into its 
native conformation or bind to another unfolded polypeptide chain. 
The two reactions compete against each other, and the relative rates of 
each reaction will determine the folding efficiency of the polypeptide 
chain. The shaded areas on the polypeptide chain represent binding 
sites that stabilize the folded protein or soluble oligomer. 


chains may take. First, the polypeptide chain may re¬ 
fold into its correct conformation and become a func¬ 
tional protein. Alternatively, the polypeptide chain may 
bind with another unfolded chain and form a soluble 


oligomer (Silow and 01iveberg|1997 

Bitan et al. 

2001 

Kayed et al.[2003 

Kayed et al.[2004 

Cleary et al. 

2005 

Haass and SeUcoe 

2007 Vieira et al 

2007 [Wei 

et al. 

2001). Soluble oligomers can then bind with additional 


unfolded chains and eventually become a solid protein 
aggregate. The formation of soluble oligomers and solid 
protein aggregates is detrimental for two reasons. First, 
protein aggregation competes with the proper folding of 
a protein (Kiefhaber et al. \99\) . A protein’s folding 
efficiency decreases when more unfolded polypeptide 
chains bind to each other. Second, soluble oligomers 
and solid protein aggregates are cytotoxic species that 


have been associated with several disorders (Kayed 
jet al.|2003[ [Haass and Selkoe|2007[ [Weira et akPOOTT 

[Shankar et al.|2008] l. Therefore, the viability of organ¬ 
isms depends on the ability of their proteins to maintain 
their correct conformations and avoid aggregation. 


For this reason, all organisms produce numerous 
molecular chaperones that prevent unfolded polypep¬ 
tide chains from aggregating. Molecular chaperones 
can bind to unfolded polypeptide chains, thereby pre¬ 
venting their aggregation, or they can tag the polypep- 
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tide chains with ubiquitin, which marks the polypep¬ 
tide chains for destruction by the proteasome (|Hayes 


and Dice|1996[ Kopito 2000[^ Maurizi|2002t |McClellm 

et al.||2005 Kaganovic i et al.|2008| l. The proteasomal 

system also degrades proteins after they have aggre¬ 
gated ([Dougan et al. 


Liberek 


et al. 2008[ Tetzlaff et al.| 20081. Finally, protein ag¬ 
gregates may also be degraded via autophagy, whereby 
aggregated proteins are transported to lysosomes and di- 
gested ([Kopito 2000 Garcia-Mata et al[|2002[ Kruse 


e^jj [^06 Yorimitsu and Klionsky||2007| l. Choe and 

Strange (20081 observed that half of the genes up- 
regulated when the nematode C. elegans is exposed to 
aggregate promoting environmental stresses are associ¬ 
ated with protein degradation. Especially up-regulated 
were genes associated with proteasomal and lysosomal 
degradation. New polypeptide chains will have to be 
synthesized to take the place of degraded chains, so high 
rates of protein aggregation can result in high rates of 
protein turnover. 

The expression of molecular chaperones is correlated 


with the heterozygosity of organisms. Kristensen et al. 
( 2002| l and Pedersen et al. ( 2005 | l, using an enzyme- 
linked immunosorbant assay, found that inbred fruit 
flies (D. melanogaster and Drosophila buzzati) synthe¬ 
sized more heat shock proteins (Hsps) than outbred fruit 
flies at benign and elevated temperatures. Since Hsps 
are a type of molecular chaperone, the authors of these 
papers concluded that inbred fruit flies contain a higher 
number of unfolded or misfolded polypeptide chains, 
and potentially higher rates of protein aggregation, than 
outbred fruit flies, even at benign temperatures. Thus, 
low levels of heterozygosity may result in higher lev¬ 
els of protein aggregation, which in turn can result in 
higher rates of protein turnover and lower metabolic ef- 
flcienciesQ 

[Kristensen et al.| ( |2002| l and [Kristensen et ah] ( |2009| l 
used the dominance theory of inbreeding depression 
to explain the correlation between heterozygosity and 
lower Hsp concentrations. They argued that proteins 
encoded by deleterious recessive alleles may be less 
stable, and more prone to aggregation, than the pro¬ 
teins encoded by normal alleles. Consequently, the in¬ 
creased expression of deleterious recessive alleles by in- 
bred organisms may increase their demand for molecu¬ 
lar chaperones. The following subsection will develop 
an overdominance theory to explain the correlations be¬ 
tween heterozygosity and Hsp concentrations, protein 
turnover, and metabolic efficiency. The overdominant 


jCherT et al.[j^06f has supported these findings with similar ex¬ 
periments performed on Pacific Abalone populations. 


explanation will then be extended in subsequent sec¬ 
tions to provide a biochemical basis for truncation selec¬ 
tion that favors heterozygosity. Afterwards, the trunca¬ 
tion selection model will be used to explain why higher 
ploidy levels are advantageous in certain circumstances. 


2.2. Model 

A heuristic model is developed in this subsection 
that shows how an organism’s heterozygosity can in¬ 
fluence its expression of molecular chaperones, protein 
turnover, and metabolic efficiency. This model shows 
how protein aggregation reactions can provide a phys¬ 
ical basis for overdominance and metabolic heterosis, 
which may explain the results of some breeding exper¬ 
iments ( [Li et al.[[2001[ [Zhou et ar][2012[ ). The model 
will also show that the relationship between heterozy¬ 
gosity and metabolic efficiency should be linear (addi¬ 


tive epistasis) as described in several studies ((Koehn 
and Shumway 1982 Garton et al.[[1984j [Mitton and 
Grant 1 19^ Hawkins et al.|1986| l). This contrasts with 
the usual exponential relationship between heterozygos¬ 
ity and phenotype anticipated by multiplicative epistasis 
( [Charlesworth and Willis|2009| l. 

The model will assume that an organism maintains 
steady-state concentrations of functional proteins that 
are continuously unfolding and either refolding or form¬ 
ing soluble oligomers. Soluble oligomers must be de¬ 
graded by the organism when they form because they 
are toxic. Also, the organism must synthesize new 
proteins to replace those that were removed when the 
organism destroyed its soluble oligomers. The result 
is three steady-state concentrations, [FJ^teady, [U]steady, 
and [O]steady, which are the concentrations of folded 
protein, unfolded protein, and soluble oligomer, respec¬ 
tively. In this model, it will be assumed that the or¬ 
ganism maintains soluble oligomers at a critical steady- 
state concentration. If the steady-state concentration of 
soluble oligomers rises, then the organism will increase 
its concentration of molecular chaperones to lower the 
soluble oligomers’ steady-state concentration back to 
their critical level. This may be accomplished through 
a feedback mechanism, such as the unfolded protein re¬ 
sponse (UPR) that occurs inside the endoplasmic retic¬ 
ulum ( [Schroder and Kaufman [ [2005 [ [Bernales et al.[ 
2006| l. As a consequence of the steady-state assump¬ 
tion, the rates of soluble oligomer formation and degra¬ 
dation will primarily depend on the rate that unfolded 
polypeptide chains are introduced into the system (see 


Equation 10 below), which is similar to the in vivo 


model described in [Kiefhaber et al.| ( [1991| l. 

Another assumption for the model is that the initial 
binding reactions are the rate-limiting step in the var- 
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ious protein aggregation pathways, and that molecular 
chaperones prevent the accumulation of protein aggre¬ 
gation products “downstream” from the initial binding 
reactions (see |Dobson| ( |2003| l for an overview of the 
many aggregation pathways that can occur). This ap¬ 
proach has been used by other researchers to success¬ 
fully model protein aggregation dynamics both in vitro 
and in vivo (Kiefhaber et al.|19^|Hasegawa et al.|1999[ 


Borgia et a l.|2013^ . As a consequence of this assump¬ 

tion, the kinetics of soluble oligomer formation will fol¬ 
low second-order rate laws in the model. 

The first thing to consider is the rate at which an un¬ 
folded polypeptide chain folds into its native conforma¬ 
tion. The polypeptide chain may be unfolded because 
it has been recently synthesized or because a previously 
native protein unraveled. The latter process may be part 
of the protein’s normal condition (perhaps because it 
contains intrinsically unstructured regions) or may be 
induced by environmental stress. Regardless, most un¬ 
folded polypeptide chains must fold into their correct 
conformation in order to be functional. This takes time, 
especially if the folding chain becomes trapped in a 
metastable intermediate state ( [Onuchic et al.|1995] Levy] 
et al.||2005| jNevo et ^ 20051. Nevertheless, folding 
(and refolding) proceeds according to a first-order rate 
law ( jKiefhaber et al.|1991] l. 


d[N] 

dt 


kf[U] 


( 1 ) 


where [N] is the concentration of native protein, f is time 
kf is the rate constant for the folding reaction, and [U] 
is the concentration of unfolded polypeptide chains. 

Alternatively, the unfolded polypeptide chain may 
bind to another and form a soluble oligomer, which may 
serve as a seed for protein aggregation. The process 
of protein aggregation is highly specific in that pro¬ 
tein aggregates are highly enriched with a single pro¬ 
tein species, even when two or more polypeptides are 


aggregating simultaneously (London et al. 

T974| 

Speed 

et al.|19961 Kopito]2000[ Rajan et al.]2001| 

More 

1 et al. 

2008). The process may be so specific that small differ- 


ences in amino acid sequence can inhibit co-aggregation 
of different polypeptide chains. For example. Mead 


e^n (200^, jO’Nuallain et al.| ( |2004| l, and jApostol 
et al. (20101 have found that a single point mutation 


can prevent amyloid fibrils from co-aggregating. Other 
researchers have similarly found that amyloid forma¬ 
tion can be inhibited in mixtures of polypeptide variants 


(]Hasegawa et al. 

1999||Rochet et aL|20001|Lashuel et al. 

2003 

|Yagi et al. 

2005) Lewis et al.|2006 Tahiri-Alaoui 


et al. 2006| l. 


( 2010) 1 have argued that the specificity of amyloid for¬ 
mation may be due to changes in amyloid conformation 
brought about by point mutations. King et aT] ( |1996j l, 
[Sinha and Nussino^pOOlj l, and |Xu et al. ( 2013| l h^e 
shown that point mutations can change the probabili¬ 
ties of polypeptide chains assuming particular confor¬ 
mations without altering the stability or conformation 
of the native protein. Thus, point mutations may re¬ 
duce the probability of two allozymes assuming sim¬ 
ilar conformations, which would decrease the likeli¬ 
hood of them aligning properly in order to co-aggregate 
( jO’Nuallain et al.|2004[|Ma and Nussinov|2012] l. In ad¬ 
dition point mutations may introduce incompatibilities 
(e.g. steric repulsion between side chains) that prevent 


two polypeptide chains from co-aggregating (Apostol 


et al. 2010]). Note that some polymorphisms do co¬ 


aggregate (see Wright et al. ( 2005) 1 and Krebs et al. 
( 2004) l), but they are not likely to persist in natural pop¬ 
ulations since they do no confer any heterozygous ad¬ 
vantage. Also, the specificity of protein aggregation 
may only apply to proteins with certain physical proper¬ 
ties. This paper will discuss the importance of proteins 
with intrinsically unstructured regions (lURS) in Sec- 
tions )3.4) and )4.2) However, this is not a major limitation 
since balancing selection would only apply to a fraction 
of all protein coding gene loci. 

The specificity of protein aggregation implies that the 
formation of soluble oligomers proceeds as a second or¬ 
der reaction ( Kiefhaber et al. 1991) Bitan et ar]|2001[ 
IZhdanov and Kasemo||2004] Zhu et al.|2010) l. For ex¬ 
ample, the rate law for the formation of a dimer will be; 


d[0] 

dt 


= hWY 


( 2 ) 


where [O] is the concentration of the soluble oligomer 
and kh is the rate constant for the self-binding reaction. 

A comparison of Equations [T] and ]^ reveals that the 
rate of soluble oligomer formation is more dependent 
upon the concentration of unfolded polypeptide chains 
than is the rate of protein folding. For example, an in¬ 
dividual that is heterozygous at a given gene locus will 
synthesize two different allozymes, which should have 
approximately half the concentration that they would 
have if the individual were homozygous at the gene lo¬ 
cus. From Equation)!] it follows that each allozyme will 
fold half as fast, and the combined rate of folding will 
be about the same for heterozygous and homozygous 
individuals. This can be expressed generally as: 


d[N] 

dt 


^ rikfiWi] 


(3) 


Both O’Nuallain et al. (2004i and Apostol et al. where r, is the actual concentration of allozyme i in an 
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individual divided by the concentration of i if the indi¬ 
vidual were homozygous for i (r, will have a value of 
1 in an organism that is homozygous for i, a value of 
0 in an organism that does not produce i, and a value 
of =s0.5 in an organism that is heterozygous for i). In 
contrast, soluble oligomers of an allozyme will form at 
one-quarter the rate in an individual that is heterozygous 
for the allozyme than in an individual that is homozy¬ 
gous for the allozyme (rate/,e, a!0.25rate/,om). The com¬ 
bined rate of soluble oligomer formation for allozymes 
in heterozygous individuals will be approximately half 
the rate for homozygous individuals (0.25rate/,om + 
0.25rate/,o„, = 0.5rate/,o,„), or more generally; 

(4) 

Equations[T]^show that diluting the concentration of an 
unfolded polypeptide chain shifts the competition be¬ 
tween protein folding and soluble oligomer formation 
in favor of folding. Thus, heterozygosity increases the 
folding efficiency of unfolded polypeptide chains sim¬ 
ply by diluting their concentrations. 

Another way to think of the influence that heterozy¬ 
gosity has on soluble oligomer formation is to consider 
the number of collisions that will occur in a given time 
period. In a homozygous organism, all of the collisions 
will be between the same allozyme. In contrast, 50% 
of the collisions in a heterozygous organism will be be¬ 
tween alternate allozymes. Each allozyme buffers the 
soluble oligomer formation reaction of the other. The 
critical assumption is that soluble oligomer formation is 
highly specific, which is corroborated by the research 
papers cited above. 

Equationj^gives the rate of dimer formation at a sin¬ 
gle gene locus. The total rate of soluble oligomer addi¬ 
tion for all alleles at all gene loci is: 

j ‘ 

+ ^ ^ kh2ji>'Dji>'Uji[Dji][Uji\ (5) 

j i 

+ ^ ^ khjjirrpruji[Tji][Uji]... 

j i 

where [Up] is the concentration of unfolded polypep¬ 
tide chain expressed by each allele i at each gene locus 
j, [D] is the concentration of dimer, [T] is the concen¬ 
tration of trimer, rp is the concentration of a chemical 
species in an organism that is heterozygous for ji di¬ 
vided by its concentration in an organism that is ho¬ 
mozygous for ji, and k^ji is the rate law constant for 


the binding reaction of each unfolded polypeptide chain 
encoded by each allele at each gene locus. 

Eor the purpose of simplicity, it is assumed in the rest 
of this subsection that a single molecular chaperone is 
responsible for removing soluble oligomers. The rate 
of removal can be expressed in terms of the Michaelis- 
Menten Equation for the molecular chaperone ( |Kon- 
depudi|2008| l: 

dR R max\.U^steady 

- = -^ (o) 

dt Kfjj -t [Unsteady 

where R„,ax is the chaperone’s maximum rate of re¬ 
moval, [O]steady is the Steady-State concentration of sol¬ 
uble oligomer, and is the Michaelis-Menten con¬ 

stant for the molecular chaperone. The value of Rmax 
is directly proportional to the concentration of molecu¬ 
lar chaperone: 


Rmax = hM, (7) 

where M, is the total concentration of molecular chap¬ 
erone and k 2 is the rate law constant for the molecular 
chaperone’s catalyzing reaction. Combining Equations 
m andand solving for M, gives: 


M, = ( 


K„,+ Steady 

k2[0] 

steady 


M 

’~di 


( 8 ) 


Equationj^gives the concentration of molecular chaper¬ 
one necessary to maintain a particular steady-state con¬ 
centration of soluble oligomers at a given heterozygos¬ 
ity. It predicts that inbred organisms will have higher 
concentrations of molecular chaperones than outbred 


organisms, which has been confirmed by Kristensen 
[eTaTldSOOSl i. 


The degradation of soluble oligomers must be bal¬ 
anced by the synthesis of new proteins in order to main¬ 
tain a steady-state concentration of properly functioning 
protein within the cytosol: 


dP dR dA 
dt dt dt 


(9) 


where dP/dt is the rate that new proteins are synthesized 
to replace proteins that have formed soluble oligomers 
and V is the number of proteins making up the soluble 
oligomers. Equations and predict that inbred 
organisms should have higher rates of protein turnover 
than outbred organisms, which has been confirmed by 
[Hawkins et al.| ( [T98^ . 

Einally, the removal of soluble oligomers and their 
replacement with new proteins is an energy consuming 
process. The overall calorie consumption rate due to 
protein maintenance, dCMaim/d] is: 
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Figure 2: Results from Equation |l 1 [ showing that metabolic efficiency 
increases with heterozygosity. The dashed curve shows the results for 
a higher k\ value, which increases with both the stressfulness of the 
environment and the abundance of aggregation-prone proteins synthe¬ 
sized by the organism. 




j i 

+ ^ ^ (fiji + ^7ji)^b2jif'DjiWji[Dji][^ji] 

J i 

(J3ji + 4-yji)kh,jirTjiruji[Tji][Uji]... 

( 10 ) 


J ‘ 


where p is the number of calories consumed during the 
degradation of each soluble oligomer and vy is the num¬ 
ber of calories consumed during the synthesis of each 
replacement protein. Equation predicts that inbred 
organisms should be less metabolically efficient than 
outbred organisms, which has been confirmed by nu¬ 
merous studies (see |Mitton| 1997 for review). A simpli¬ 
fied version of Equation|lO|is obtained by assuming that 
only the dimerization reaction has a non-negligible rate, 
and by assigning every aggregating protein the same 
value for kj, and [Ufi, which are then combined into 
a single parameter, ki: 


dC, 


Maim 


dt 


.h\p + 2y][T -Q.5NHeA 


( 11 ) 


where ^uet is the number of heterozygous gene loci. As 
shown in Eigurej^ Equation[^is linear with respect to 
heterozygosity, which means that the overdominant loci 
exhibit modest epistasis for metabolic heterosis, other¬ 
wise the equation should decline exponentially with het¬ 
erozygosity ( Charlesworth and Willis|[2009 1. A linear 
relationship between metabolic efficiency (as measured 
by oxygen consumption and growth rate) and heterozy¬ 
gosity has been found for several species of marine bi¬ 
valves ( KoehnandShmnwayjl^S^ Garto^t al.|1984[ 


[Mitton and Grant] 1984 Hawkins et al.|198'^ 


Equations [T[-[T0] provide a heuristic model for the re¬ 
lationship between protein aggregation and metabolic 
heterosis. The model shows that heterozygosity can 
be beneficial even in the absence of deleterious muta¬ 
tions, and is in agreement with previous research that 
found a correlation between heterozygosity and; (1) 


metabolic efficiency (Koehn and Shumway|1982 Gar- 

[ton et al. 1984 

Mitton] 

1997|l, (2) protein turnover 

(Hawkins et al. 

1986 Hawkins et al.||l989l), and (3) 

expression of molecular chaperones (Kristensen et al. 


2002[ Pedersen et al. 2005 | l. The model predicts that 


inbreeding depression should increase with the stress¬ 
fulness of the environment because the values of k/, and 
[UJsteady for each allele should increase with the stress¬ 
fulness of the environment (see Subsection |3.3|l. This 


is also supported by previous research (see Armbruster 
land Reed|2005|for review). 


2.3. Thermodynamic Considerations 

It may be helpful to conceptualize the underlying 
physics of the model developed in this paper. The model 
assumes steady-state conditions in which the rates of 
soluble oligomer addition, soluble oligomer removal, 
and synthesis of replacement proteins are equal (Equa¬ 
tion 1^. This results in steady-state concentrations of 
unfolded polypeptide chains, soluble oligomers, and na¬ 
tive proteins. These steady-state concentrations are not 
in chemical equilibrium, otherwise the reactions for sol¬ 
uble oligomer formation would not proceed forward. 
Eor example, the fact that two unfolded polypeptide 
chains bind to each other to form dimers suggests that 
they have greater chemical potentials than the dimers 
(i.e. 2fiunfaided > i^dimer)- We Can express this in terms of 
chemical affinity. A, which is the difference in chemical 
potential between reactants and products (jKondepudi 
[2008) 1. 


aided ^ 0 


( 12 ) 


The greater the chemical affinity of a reaction, the fur¬ 
ther away it is from equilibrium. 

However, the aggregation reactions proceed forward 
more slowly in hybrid organisms than in non-hybrids 
because the hybrids produce more allozymes. Point 
mutations can change the probability distributions for 
polypeptide chains assuming particular conformations 
without affecting the stability of the polypeptides’ na¬ 
tive conformations (King et al.|1996 [ jSinha and Nussi- 
2001) Xu et al. 201 3| l. Thus lowers the probabil- 


nov 


ity of two allozymes co-aggregating because polypep¬ 
tide chains must have similar conformations in order to 
form the necessary cross-y6 interactions that allow them 
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to bind to each other ( |0’Nuallain et al.|2004| l. Point mu¬ 
tations in polypeptide chains can also introduce incom¬ 
patibilities (e.g. steric repulsion between side chains) 
that prevent two chains from co-aggregating ( A postol| 
et al.||201^. Therefore, hybrid organisms benefit from 


lower rates of protein aggregation due to their greater 
degree of “mixedupness.” 

This can be thought of in terms of chemical poten¬ 
tial and chemical affinity. The chemical potentials of 
the unfolded chains and dimers are related to to their 
concentrations by; 


fJ^unfolded - Munfolded + ln[U] (13a) 

fJ^dimer = l^°dimer + /«[£>] (13b) 


where /i° is the standard-state chemical potential, kT 
is the temperature, [f/] is the concentration of un¬ 
folded polypeptide chain, and [D] is the concentration 
of dimer. Since an organism’s molecular chaperones 
work to keep the concentration of dimer low, and the 
concentration of unfolded polypeptide chain should be 
about half as much in an organism that is heterozygous 
for the polypeptide than in an organism that is homozy¬ 
gous for the polypeptide, then the chemical affinity of 
the dimerization reaction should be lower in the het¬ 
erozygous organism than in the homozygous organism. 
In other words, the homozygous organism is further 
away from chemical equilibrium. 

Chemical affinity can be related to the rates of ele¬ 
mentary step reactions by ( Kondepudi|2008|l: 


A 

kf 


— ln{ 


Rl 

Rr 


) 


(14) 


where Rf is the rate of the forward reaction and Rr is the 
rate of the reverse reaction. Finally, the relationship be¬ 
tween the rate of progression and affinity for a chemical 
reaction is: 


— ^Rf{l 
dt ^ 


. gtr) 


(15) 


where the rate of chemical progression, is the differ¬ 
ence between the forward and reverse reactions. Com¬ 
paring Equations 1^ 14 and it follows that 
the rate of progression for a polypeptide chain’s dimer¬ 
ization reaction will be slower in an organism that is 
heterozygous for the polypeptide than in an organism 
that is homozygous for the polypeptide because the het¬ 
erozygous organism is closer to chemical equilibrium. 

Thus, organisms must expend energy to remove solu¬ 
ble oligomers because they maintain steady-state con¬ 
centrations of native proteins, unfolded polypeptide 


chains, and soluble oligomers that are different from 
their equilibrium values. However, the aggregation 
reactions proceed toward chemical equilibrium more 
slowly in hybrid organisms than in non-hybrids, so 
the hybrids need a lower rate of calorie consumption 
to maintain steady-state conditions (Equation [T0|. An 
analogy can be drawn with refrigerators, which con¬ 
sume energy to maintain steady-state thermal gradients. 
The amount of power a refrigerator consumes increases 
with the difference between its inside and outside tem¬ 
peratures because the rate of inward heat flow increases 
with the refrigerator’s thermal gradient. Likewise, the 
constant movements toward and away from chemical 
equilibrium are responsible for organisms’ maintenance 
costs, and the different rates of these movements give 
rise to the differences in performance (growth rate, size, 
etc.) between hybrid and non-hybrid organisms. 


3. Epistasis and Truncation Selection 


3.1. Inbreeding and Epistasis 


Subsection |2.2| discussed how heterozygosity affects 
an organism’s metabolic performance when the organ¬ 
ism maintains steady-state conditions. This subsection 
will consider the effect heterozygosity has on an or¬ 
ganism’s viability when the organism’s defenses against 
protein aggregation are overwhelmed and steady-state 
conditions are no longer maintained. Some studies have 
found a steep drop in viability at high levels of inbreed¬ 


ing depression, especially in D. melanogaster (Kosuda 


1972[ Rumball et al. 1994t Eigure [^. These results 


suggest that extinction of an inbreeding line is likely 
to occur when the level of inbreeding exceeds a thresh¬ 
old value ( Erankham|1995 1. This might be evidence for 
synergistic epistasis between deleterious mutations (Ko- 
|suda|1972||Charlesworth|1998] l. However, it could also 
be evidence for epistasis between heterozygous gene 
loci, whicb jOielo and Teot6nio| ( |2012| l has has observed 
in their experiments. Several authors have shown that 
molecular chaperones can provide a biochemical ba¬ 


sis for epistasis by inhibiting protein aggregation (Eares 


let al.|[2002] ISollars et al.||2003l |Maisnier-Patin et ah 


120051 |de Visser and Elena|20071 |de Visser et al.|20111 

|Lehner|2011| l. This subsection expands on these ideas 
and develops a theory that provides a biochemical ba¬ 
sis for the steep decline in viability that sometime oc¬ 
curs with increasing inbreeding. Then it discusses the 
model’s implications for selection for heterozygosity. 

One of the reasons that protein aggregation is such 
a nuisance is that organisms seem to produce as much 
of a protein as they can just before it starts to aggregate. 
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Aerereeation rate (loe scale) 


(a) 



Cone. Soluble Oligomer 

(b) 


Figure 3: (a) Reproduction of data in Fig. 2 of |Rumball et al.] 
jl994^ . The curve shows the number of offspring produced by D. 
melanogaster after multiple generations of full-sib mating, (b) Equa- 
tion |16| gives the relative fitness of an organism with increasing con¬ 
centration of soluble oligomers. 


Tartaglia et al. ( 2007]l provides evidence for this in a plot 


similar to that shown in Figure which shows that the 
logarithm of a protein’s aggregation rate is negatively 
correlated with the logarithm of its expression level. 
From this plot, the authors of |Tartaglia et al.| ( 2007| l con¬ 
clude that organisms have “no scope for dealing with 
any situation in which these [expression] levels rise fur¬ 
ther or whereby the aggregation rates are increased... 
In the context of protein solubility, therefore, we are 
constantly living our lives at the edge of a molecular 
precipice.” In other words, we should expect steep de- 


Figure 4: Reproduction of plot in Fig. 1 of |Tartaglia et al. 1^2007] . 
The curve shows that the expression level of a protein is negatively 
correlated with its aggregation rate. This may mean that organisms 
produce as much protein as they can before it starts to aggregate. 


2003| l. Thus, the htness of individuals with a given con¬ 
centration of soluble oligomer can be obtained in man¬ 


ner similar to that given in Kimura and Crow (1978 1 : 


W([0]) = 




J[0] 




-{InC-inC^Q)- 

dC 


( 16 ) 


where W([0]) is the htness of individuals with total sol¬ 
uble oligomer concentration [O], Wmax is the htness of 
individuals that produce no soluble oligomers, C is the 
concentration of soluble oligomer, C 50 is the concentra¬ 
tion of soluble oligomer that kills half of the individuals 
in a population, and cr is the shape parameter for the 
log-normal distribution. 

I used a log-normal distribution for Equation [T 6 ]be- 
cause log-normal distributions are commonly used in 
the toxicology literature, but any distribution that yields 
an S-shaped CCDF (e.g. normal, log-logistic, Weibull) 
can also be used in a truncation selection model. Use 
of a log-normal distribution would be justihed if the 
toxicities of soluble oligomers are due to many ran¬ 
dom variables that result in multiplicative degradation 
( |NIST/SEMATECH|[200^ , which would be true if, as 
hypothesized by others, their toxic effects are due to a 
general disruption of homeostasis caused by the perme- 


|et al. 12003] |Haass and Selkoe|2007[ [Vieira et aL|2007[ 


dines in htness when protein homeostasis is perturbed 

abilization of cellular and organelle membranes ( Kayedj 

and soluble oligomers accumulate within an organism. 

et al.|2003l 

Bucciantini et al.|2004[ Kayed et al.|2004 

Since soluble oligomers are toxic substances ( Kayed| 

Glabe||2006 

1 . A log-normal distribution would also be 


Shankar et al. 2008|l, the equations used by toxicologists 


may be useful in modeling the htness declines brought 
about by the accumulation of soluble oligomers. 

The responses of organisms to toxic substances typ¬ 
ically follow log-normal distributions (Wagner and 


justihed if the aggregation of regulatory and signaling 
proteins leads to a disruption in developmental home¬ 
ostasis as postulated by I. Michael Lerner (see Section 


L0kke|1991[ Wheeler et al.|200^ Newman and Unger 


Equation[^assumes that neither allele at a heterozy¬ 
gous gene locus negatively impacts the phenotype of its 
carrier. This may not be true and may be addressed with; 
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W([0]) = 5([0]) Y\{1- hiSi) (1 - ^,0 (17) 


where S([0]) is the right side of Equation [Th] s is the 
fitness cost of the inferior alleles, and h is the dominance 
of the inferior alleles (h-\ is completely dominant and 
h-Q is completely recessive). Yl(l-hs) is the fitness cost 
of inferior alleles at heterozygous gene loci, and 11(7-i) 
is the fitness cost of inferior alleles at homozygous gene 
loci. 

The relative fitness of an individual, {W[0]/Wmax), 
can be calculated once the total concentration of sol¬ 
uble oligomers in the individual is known (Figure [^. 
There are two approaches to calculate the concentra¬ 
tion of soluble oligomers for Equation The first ap¬ 
proach assumes steady-state conditions like in the pre¬ 
vious section, but imposes an upper limit to either the 
amount of molecular chaperone an organism can pro¬ 
duce or to the amount of energy available for the chap¬ 
erone to perform its functior0 The assumption of an 
upper limit on available energy is reasonable because 
there are limits to the amount of oxygen organisms can 
acquire from their environment (P6rtner|2001 Bickler 


and Buck 2007 [ Ramirez et al. 


2007| l. This assump¬ 


tion may apply to organisms in moderately severe en¬ 


vironments or to severely inbred animals (e.g. Rumball 


|et al.|1994| ). Under such conditions, the rate of soluble 
oligomer removal (Equation may still equal the rate 
of soluble oligomer addition (Equation]^, but the ef¬ 
fective concentration of molecular chaperone (the con¬ 
centration of molecular chaperone activated by ATP) is 
constant, which results in a maximum rate of soluble 
oligomer removal, {Rmax)- The result is: 


Rmax 

The steady-state concentration of soluble oligomer in¬ 
creases as dA/dt approaches the value of Rmax- This will 
occur as the stressfulness of the environment increases 
or with decreasing heterozygosity. 

The second approach assumes that steady-state con¬ 
ditions are disrupted, which would occur if ^ > Rmax, 
or if an organism is unable to remove soluble oligomers 
once they have formed. There are several reasons why 
an organism would fail to maintain steady-state condi¬ 
tions: 


^ Some chaperones, such as Hsp70, require ATP to bind to un¬ 
folded polypeptide substrates (Patterson and Hohfeld|20061|Lotz et al.| 


1. The organism is dormant. Many organisms possess 
dormant resting stages, such as spores and cysts, 
that offer protection during adverse environmental 
conditions. These resting stages are usually asso¬ 
ciated with high concentrations of compatible so¬ 
lutes (e.g. trehalose) and small heat-shock proteins 
(sHSPs). Both compatible solutes and sHSPs can 
prevent unfolded polypeptide chains from binding 
to each other without consuming ATP. However, 
they cannot re-fold, disaggregate, or degrade mis- 
folded proteins; they simply inhibit the formation 


of soluble oligomers (Singer and Lindquist 1998 


Garay-Arroyo et al.|2000 Waters et al.|2008) |van| 


3. 


Leeuwen et al.|20131 l. 

2. The organism does not have enough readily avail¬ 
able energy to maintain protein homeostasis. All 
organisms have a finite energy supply available to 
them at any time. The ability of animals to gener¬ 
ate ATP, for example, is limited by oxygen avail¬ 
ability. Furthermore, several stresses are known 
to reduce an organism’s ability to generate ATP. 
Thermal stresses can reduce the aerobic scope of 
ectothermic animals and desiccation can lead to 
a suspension of metabolism ( |Portner|2001[ |Alpert| 
[and 01iver|2002| ). 

The organism is exposed to a sudden environmen¬ 
tal shock. This can lead to elevated levels of pro¬ 
tein aggregation. The organism should respond to 
the shock by synthesizing more molecular chaper¬ 
ones, but soluble oligomers can accumulate while 
the organism is adjusting to the new conditions. 
The organism has an extracellular space. Multi¬ 
cellular organisms can secrete proteins into their 
extracellular space, which has orders of magni¬ 
tude lower ATP than the intracellular space. Little 
is currently known about the process of preserv¬ 
ing protein homeostasis in the extracellular space, 
but animals appear to produce extracellular chap¬ 
erones that inhibit formation of soluble oligomers 
in a manner similar to sHSPs ( |Poon et al.| | 2002'{ 
Mannini et al.|201^ Wyatt et al.|2013| l. 


4. 


In this case, the net rate of soluble oligomer accumula¬ 
tion is given by: 


,dA dA dR 
^~di’’^"‘ " ~di~~di 


(19) 


Integrating Equation 19 will give the concentration of 
soluble oligomer at a given time, but that requires 
knowledge of how [U] and ki, change with time for each 
polypeptide chain. The value of dR/dt will also change 
with time if the organisms respond to the environmental 
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shock by increasing the concentrations of their molecu¬ 
lar chaperones. Nevertheless, a [O] value can be used in 
Equation [^once it has been obtained. This would give 
the survivorship of individuals with different heterozy¬ 
gosities after exposure to an environmental shock for a 
given length of time. Equation[T^does not yield simple 
solutions like the steady-state equations, but exposures 
to suddenly elevated stresses may be a more frequent 
source of viability declines in natural environments. 

Since Equation predicts that the rate of soluble 
oligomer addition increases with decreasing heterozy¬ 
gosity, the above model predicts that viability should 
decrease with heterozygosity under a given set of en¬ 
vironmental conditions. In general, the steep decline 
in viability should occur at higher heterozygosities with 
increasing environmental harshness. This may explain 
why hybrid crops are more drought tolerant than non¬ 
hybrids ( Duvick|[2001| l, and why a threshold survivor¬ 
ship is seen in some inbreeding studies ( |Rumball et al.| 
|1994[ |Erankham]|1995| ). The model also has relevance 
for explaining truncation selection for heterozygosity. 


3.2. Truncation Selection for Heterozygosity 

[Lewontin and Hubby] ( |1966| l provided an influen¬ 
tial argument against the hypothesis that natural selec¬ 
tion favors heterozygous genotypes. They measured 
the amount of allozyme diversity in wild Drosophila 
pseudoobscura populations and found polymorphisms 
segregating at approximately one-third 2000) of 
D. pseudoobscura’s gene loci. Lewontin and Hubby| 
( |1966| l argued that such a large number of polymor¬ 
phisms could not be maintained by balancing selection 
without enormous fitness costs. Eor example, if ho¬ 
mozygosity at a single gene locus reduces the reproduc¬ 
tive potential of an individual by 10%, and only two 
polymorphisms are segregating at the gene locus at a 
frequency of 50% each, then the reproductive potential 
of the whole population will be reduced by 5% for each 
gene locus. The reproductive fitness of the population 
would be 0.95^™°, or 10“^®, its maximum value. This is 
an unrealistically low number, and they concluded that 
natural selection could not favor heterozygotes. 

Shortly after Lewontin and Hubby ( |1966| l was pub¬ 
lished, three papers responded with a similar solution 
to the problem it raised (|King 1967 Milkman 1967[ 
Sved et aT]|1967[ Crow|[T9^ l. These papers proposed 


that truncation selection could maintain a large number 
of polymorphisms in natural populations without un¬ 
reasonable fitness costs. In their models, all individ¬ 
uals whose heterozygosities are below a critical value 
have a fitness of zero, and individuals whose heterozy¬ 
gosities are above the critical value have maximum fit- 



Inbreeding coefficient 


(a) 




(c) 

Figure 5: (a) Idealized truncation selection curve. The relative fit¬ 
ness of an organism is 1 above a critical heterozygosity and 0 below 
the critical heterozygosity, (b) Truncation selection for heterozygos¬ 
ity generated by combining Equations |16| and |18| The two curves 
show truncation selection for ditferent levels of environmental stress 
as quantified by a ki parameter (see main text).(c) Equation |17| incor- 
porates the possibility that one of the alleles at a gene locus negatively 
impacts the phenotype of its canier. 


ness (Eigurej^). [Wills ( 1978| l expanded on these mod¬ 
els and showed that the number of polymorphisms that 
can be maintained in a population by truncation selec¬ 
tion depends on the effective population size. However, 
he found that truncation of the least heterozygous indi¬ 
viduals (the bottom 5%) in a population can maintain 
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polymorphisms at 66,000 gene loci in a population of 
100,000 individuals, which is more than enough to sup¬ 
port the number of polymorphisms that actually occur 
in natural populations. The effectiveness of truncation 
selection comes from its severity and its ability to op¬ 
erate on many gene loci simultaneously. Several stud¬ 
ies have found evidence for truncation selection acting 
on natural populations, but the results have been mixed 
(see |Mittori|1997| for review and |Kaeuffer et al.|2007| as 
a recent example). 

Truncation selection can be considered an extreme 
form of epistasis in which heterozygosity confers de- 
creasingly small fitness gains with each additional het¬ 
erozygous gene locus. Both Rumball et al. ( 1994} and 


Chelo and Teotonio (20121 have provided experimental 


evidence that such epistasis exists. The epistasis can be 
modeled using a S-shaped truncation curve obtained by 
combining Equations [^and|^ with either[r8|or[T^(Fig- 
ure|^). The curve is not strictly a truncation selection 
curve, but |Kimura and Crow| ( |1978] l has shown that se¬ 
lection is almost as efficient when it follows a S-shaped 
cumulative distribution curve (CDF or CCDF). In or¬ 
der to generate the figure, I assigned the same values 
of kb and [U]^ to all of the polypeptide chains in Equa- 
tion[^(I assumed the formation of trimers and tetramers 
was negligible). These can be considered the average 
values of k/, and [U]^ for aggregation-prone proteins 
in a hypothetical organism. These values were com¬ 
bined into a single parameter k\-kh[U]^, which quan¬ 
tifies the propensity of proteins to aggregate. The value 
of kx should increase with the stressfulness of the envi¬ 
ronment and the susceptibility of an organism’s proteins 
to aggregation (see below). Figure]^ shows the results 
when Equation[T7]is used to generate a truncation curve. 
The same value of hs is assigned to each suboptimal al¬ 
lele, which can be taken as an average of the hs values 
for all suboptimal alleles. 

The model depicted in Figures]^ and|^ may provide 
a physical basis for truncation selection for heterozy¬ 
gosity. According to the model, organisms are periodi¬ 
cally exposed to stresses that lead to an accumulation of 
soluble oligomers, which in turn results in a sharp de¬ 
cline in viability. Only highly heterozygous individuals 
will be found in stressful environments because less het¬ 
erozygous individuals will be on the wrong side of the 
truncation curve. This illustrates the severity of trun¬ 
cation selection, which is not typically associated with 
gradual evolution. If all the individuals in a species are 
on the wrong side of a truncation curve, then the species 
will go extinct. Thus, some individuals must be on the 
right side of the truncation curve prior to the species be¬ 
ing subjected to truncation selection. In other words. 


some of the individuals in the species must be “pre¬ 
adapted” to the environment. 

Another interesting property of the truncation selec¬ 
tion model is that it depends only on the number of het¬ 
erozygous gene loci, and not on the identity of the gene 
loci. This is analogous to a “colligative property” in 
chemistry. This feature is important because it makes 
selection for heterozygosity compatible with sexual re¬ 
production. Offspring are not going to be heterozy¬ 
gous at the same gene loci as their parents, but they 
will have, on average, the same number of heterozygous 
gene loci as their parents (assuming random mating in 
the population), so they should have the same overall 
fitness. Truncation selection allows organisms to sub¬ 
stitute heterozygosity at one gene locus for heterozy¬ 
gosity at another gene locus without suffering signifi¬ 
cant fitness costs because the selection is acting on the 
overall level of heterozygosity, not on heterozygosity at 
any particular gene locus. The colligative nature of het¬ 
erozygous advantage also applies to Equation[^ which 
relates heterozygosity to the metabolic efficiency of an 
organism. Again, overall metabolic efficiency depends 
on the number of heterozygous gene loci, not on the 
identity of the gene loci. Of course, there is the caveat 
that only some of the proteins produced by an organ¬ 
ism are aggregation-prone, thus the identity of the gene 
loci is not completely unimportant. Rather, the colliga¬ 
tive nature of heterozygous advantage is confined to the 
subset of gene loci that code for aggregation-prone pro¬ 
teins. 

The truncation selection model has one important 
consequence. Factors that increase the accumulation of 
soluble oligomers in organisms should lead to trunca¬ 
tion selection for higher heterozygosities. These factors 
can be broken up into two categories: 1) environmental 
stresses that promote protein aggregation, and 2) inher¬ 
ent characteristics of proteins that affect their propensity 
to aggregate. Section will discuss how these factors 
interact to create selection for different ploidy levels, so 
it will be useful to describe how each promotes protein 
aggregation. 


3.3. Environmental Stress 

Temperature, water stress, and hypoxia can promote 
protein aggregation. High and low temperatures de¬ 
nature proteins, which leads to high concentrations of 


unfolded polypeptide chains (Becktel and Schellman 


1987|l. The values of the rate law constants for pro¬ 


tein aggregation also increase with temperature (|Wang 


and Roberts |2013|l. Water stress (desiccation, freezing. 


high salinity) is created by the limitation of liquid wa¬ 
ter inside the organism, which results in high concen- 
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trations of unfolded polypeptide chain (because the vol¬ 
ume of solvent is limited). Water stress can also cause 
proteins to unfold by weakening the hydrophobic effect 
( |Prestrelski et al.|[l993[ [Allison et al.||1999l l. In addi¬ 
tion, water stress results in molecular crowding, which 
increases the value of the rate law constants for pro- 


tein aggregation (Ellis 

2001 

Smallwood and Bowles 

20021 jChebotareva et a 

12004 

Ellis and Minton|2006 

White et al.||2010|l. Combining the effects of tempera- 


ture and water stress is particularly harsh. For example, 
cold temperatures can cause proteins to denature, which 
greatly enhances the rate of protein aggregation when 
combined with freezing ( ^anks^etaUlWO Smallwo^ 


and Bowles|2002| [Dias et al.|2009 Singh et al.|2009 l. 
Finally, hypoxia can hinder the ability of organisms to 
remove soluble oligomers as they form because many 
molecular chaperones require ATP to bind to unfolded 
polypeptide chains (Patterson and Hohfeld||20()6 Lotz 


jet al.||20T0] l. The physical stresses that promote pro¬ 
tein aggregation should cause the truncation selection 
curves depicted in Figure]^ to shift to higher heterozy¬ 
gosity values, which should result in more heterozygous 
individuals living in harsh environments. 

3.4. Protein Length and lURs 

The intrinsic characteristics of proteins can also af¬ 


fect their propensity to aggregate. Olzscha et al. (2011 1 
determined what structural features increase a protein’s 
susceptibility to aggregation using molecular templates 
that seed protein aggregation in human cells. They 
found that protein size and the presence of intrinsically 
unstructured regions (lURs) were the two characteris¬ 
tics that most enhanced a protein’s susceptibility to ag¬ 
gregation. Both of these characteristics are associated 


with proteins that contain multiple domains (Dunker 
et al.|2005| Fong and Panchenko|2010j l. 

A domain is a sequence of amino acids that, if sep¬ 
arated from the rest of the polypeptide chain, would 
still fold into its proper conformation and function nor¬ 
mally. A multi-domain protein is a protein that con¬ 
tains multiple domains. Some multi-domain proteins 
can be considered a string of proteins that are joined 
together. In fact, proteins that exist separately in some 
species may be found as parts of multi-domain proteins 
in other species, a phenomena called “domain accre¬ 
tion” ( jKoonin et al.||2002l [Basu et ar]|2009] l. As ex¬ 
pected, multi-domain proteins tend to be larger than 
single-domain proteins. 

Multi-domain protein also tend to contain lURs. 
The reasons for this are less obvious and still debated 
( Dunker et al.|2005] Fong and Panchenko|2010 l. lURs 


can serve as inter-domain linkers that hold multiple do¬ 
mains together in a protein ( Tompa|2002| Tompa|2005j l. 
lURs can also occur in or near the binding sites of pro¬ 


teins, and may facilitate protein interactions (Dunker 


et al. 2005|Uversky and Dunker 20101. This would 


make lURs particularly relevant to multi-domain pro¬ 
teins because such proteins tend to have multiple inter¬ 
action partners ( jTordai et al.|2005]|Ekman et al. 12007) 1. 
Indeed, the need to interact with multiple partners is 
one reason why some proteins contain multiple domains 
( Patthy|2003[ Tordai et al.|2005 Tan et al. 12005) 1. Each 
domain can facilitate interactions with different part- 


As stated previously, Olzscha et al. (20111 found that 
aggregation-prone proteins tend to be large and tend to 
contain lURs, which are characteristics of multi-domain 
proteins. Large proteins are more aggregation-prone 
than smaller proteins because they take longer to fold 
(or refold) into their native conformations, which in¬ 
creases the exposure of aggregation-prone amino acid 
sequences that are typically buried within the interior of 
the folded protein (Netzer and Hartl|1997 Goldschmidt 


jet al.|2010| [Olzscha et al.|201 1| . This w^uld be true for 

both recently synthesized proteins and for proteins that 
denature due to an environmental stress. Large proteins 
also potentially contain more binding sites than smaller 
proteins, which would increase the likelihood for an 
oligomerization reaction to occur when large proteins 
approach each other (e.g. AndersenEtA12010). Pro¬ 
teins that contain lURs tend to be aggregation-prone 
because the conformational flexibility of lURs allows 
them to align properly to form the cross-/! sheet struc- 


ture that holds aggregates together 

(Carrio et al.| 

2005 

Nelson et al. [2005 

[Ventura 2005 

Wang et al. 

2008 

Olzscha et al.bol 1 

[Ramshini et al. 

20111 Stroud 

et al. 

2012|l. In fact, several amyloid-associated diseases are 


due to polypeptides that contain lURs (Uversky et al. 
20081 [Uversky et al.||2009l [Babu et al.|201 l[ l. In addk 

tion, proteins containing lURs are thought to have rela¬ 
tively short life-spans and rapid turnovers because they 
are continuously degraded by each cell’s proteolytic ma¬ 
chinery ( [Wright and Dyson||19991 l. Multi-domain pro¬ 
teins can also aggregate via “domain swapping.” This 
occurs when two domains in a multi-domain protein 
molecule are supposed to bind to each other, but instead 


they bind to the domains on another molecule (Nelsen 
and Eisenberg|200^ Rousseau et al.|20T^ . 

The abundance of large and lUR-containing proteins 
varies significantly across the domains of life. On the 
whole, prokaryotes produce shorter proteins than eu¬ 
karyotes. About 65% of prokaryote proteins are multi- 
domain whereas 80% of eukaryote proteins are multi- 
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domain (|Apic et aL 2001 1. Furthermore, the median 4. Ploidy Level 


length of eukaryote proteins is 50% longer than the me¬ 
dian length of prokaryote proteins ( |Brocchieri and 
Iin||2005 1. The relative abundance of proteins contain¬ 


ing lURs are 2%, 4%, and 33% for archaea, bacteria, 
and eukaryotes, respectively (Ward et al.|2004|l. These 


differences have caused prokaryotes and eukaryotes to 
process their proteins differently. For example, prokary¬ 
otes typically translate proteins at a rate of 10-20 amino 
acids per second whereas eukaryotes typically translate 
proteins at a rate of 3-8 amino acids per second ( |Siller| 
et al. 2010|l. As a consequence, most protein folding 


4.1. Environmental Stress and Polyploidy 

If the expression of two allozymes can inhibit the 
formation of soluble oligomers, then the expression of 
three or more allozymes should further inhibit their 
formation. Equation predicts that polyploid organ¬ 
isms should have lower rates of protein aggregation 
than diploid organisms. In addition, many polyploids 
are hybrid species, and are heterozygous at more gene 
loci than diploids (Otto and Whitton|2000)l. Therefore, 


is post-translational in prokaryotes but co-translational 
in eukaryotes (Netzer and Hartl|1997|l. Eukaryotes also 


have more complex chaperone systems that assist with 
the proper folding of their de novo and stress-denatured 
proteins (|Albanese et al.pOOh I. Thus, protein folding 


is more elaborate in eukaryotes than prokaryotes be¬ 
cause their proteins are more aggregation prone. The 
truncation selection curves depicted in Eigure should 
be shifted to higher heterozygosity values for organ¬ 
isms that synthesize more aggregation-prone proteins, 
and indeed, heterozygosity is essentially a eukaryotic 
phenomena. There are no heterozygous bacteria and ar¬ 
chaea. This argument will be expanded later to explain 


polyploid organisms should be able to tolerate more se¬ 
vere physical stresses than diploids, and in fact, may be 
“pre-adapted” to harsh environments in which diploid 
organisms cannot survive due to truncation selection. 
Indeed, since polyploidy and hybridization may be ac¬ 
companied by fitness costs (such as high genetic loads, 
slower growth rates, and outbreeding depression), poly¬ 
ploid species may be restricted to harsh environments 
where they do not have to compete with diploid species 
( |Otto and Whitton|2000| l. 

The occurrence of polyploid plants at high latitudes 
and altitudes was first observed in the 1940’s ( |Steb-| 
|bins| 1 950| |Stebbins| 1 984| l, and recent research has con¬ 
firmed that polyploid plants and animals frequently oc- 


Brocchieri and Karlin (2005 i proposed that hyperther- 


the haploid-diploid transition. 

cur in frozen environments (Beaton and Hebert 

00 

00 

A species’s geographic distribution is affected by 

Adamowicz et al.|20021 |Brochmann et al.|20041 

Lund-I 

both environmental stress and its proteins’ suscepti- 

mark and Saura||2006 |Aguilera et al.|2007| Otto et al. 

bility to aggregation. Both 

Koonin et al.| (|2002|l and 

20071 lAdolfsson et al.|2009 

1. Brochmann et al. (12004)1 



_ _ or grow in freezing condi¬ 

tions (|Russell||1998|l. Not all prokaryotes can grow in 


mophiles are common in the Domain Archaea because 
the species in this domain produce short proteins that 
are not prone to aggregation. However, the idea can be 
extended to all prokaryotes that live in extreme envi¬ 
ronments. Thermophiles can be found among both the 
Bacteria and the Archaea. Additionally, both domains 
of life contain species that are able to grow at extremely 
high salinities (Kunte et al. 2002| l, withstand desiccation 
(|Potts||1994| |Alpert|| ' 

extreme environments because additional adaptations, 
such as those promoting membrane integrity and DNA 
stability, are required (Konings et al. 2002[ Trivedi et al.| 
2005| l. However, prokaryotes should be “pre-adapted” 
to these extreme environments because they are not re¬ 
moved by truncation selection when they migrate into 
them. In contrast, truncation selection could prevent 
organisms that synthesize more aggregation-prone pro¬ 
teins, including many eukaryotes, from migrating into 
these environment (because their relative fitness would 
be zero), thereby limiting their distribution to less harsh 
environments. 


analyzed data from the Pan-Arctic Flora (PAF) Check¬ 
list ( |Elven et al.||2003l l and found that 73.7% of arctic 
plants are polyploid. Plants in the most northerly arc¬ 
tic zone were hexaploid on average. In addition, 39.2% 
of species within this zone were 7-ploid or higher, and 
17.8% of the species were 9-ploid or higher. How¬ 
ever, since these plants reproduce primarily through 
self-fertilization, the heterozygosity of these plants is 
about half what their ploidy level would indicate. 

Polyploid plants are also positively associated with 


arid 

^ones and deserts (Spellenberg||1981 |Rossi et al. 

1999 

Hunter et al.|2001 

Pannell et al.|2004 Joly et al. 

2006 

Schuettpelz et al. 

2008)1. Senock et al. ( 1991} 

and 

dao et al. (2013|l 

showed that the ploidy level 


of Atriplex canescens increases in regions of the Chi- 
huahuan Desert with increasing drought stress. Most 
resurrection plants, which grow in deserts and are capa¬ 
ble of withstanding very high levels of desiccation, are 
polyploid ( Bartels and Salamini|200l| Rodriguez et al. 
2010| l. Several studies have shown that a plant’s drought 
tolerance increases with its ploidy level ( |A1 Hakirm 
et al.| T998) |Xiong et al.||2006l l. Eor example, Ram¬ 
sey (201 1|| compared the drought tolerance of hexaploid 
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and tetraploid individuals belonging to Achillea bore¬ 
alis and found the hexaploids were more tolerant. Fur¬ 
thermore, Ramsey’s analysis of neo-hexaploid A. bore¬ 
alis individuals showed that a third of the the drought 
tolerance was achieved via genome duplication rather 
than adaptation. Another interesting example of poly¬ 
ploid adaptation to water stress may be Sequoia sem- 
pervirens, which is hexaploid and must cope with water 
stress due to its extreme height (Ahuja and Neale|2002 


dramatic illustration of this may be given in Fawcett 
et al. ( 2009|l, which argues that polyploids may have 


Koch et al.||2004l [Oldham et al.||2010| l. Polyploid 


mals are also positively associated with arid zones. For 


example, polyploid lizards are found in deserts (Tocid 
[lowski et al.| |2001[ |Kearney| | 2003| l and so is the only 
known polyploid mammal (Gallardo et al. 1999 Gal 


[lardo et al.|2004[ [Svartman et al. 112005 Gallardo et al. 

[20061 |Tbta~etal.|2014D . 

Polyploidy should also be associated with salinity 
stress because many species change their expression 
of Hsps in both hypersaline and hyposaline environ¬ 
ments (Chang 2005 Downs et al. [2009 1 Tine et al. 


preferentially survived the Cretaceous-Paleogene mass 
extinction event. 

4.2. Organism Complexity, Protein Interaction Net¬ 
works, and the Haploid/Diploid Transition 

Protein aggregation may also potentially explain why 
some species are haploid and others are diploid. The 
most suggestive evidence for this is the relative stress 
tolerances of haploid and diploid organisms. Animals, 
for instance, have lower thermotolerances than bacte¬ 
ria and archaea, and some fungi species (Portner|2001 


Salar and Aneja|2007|). Even the Pompeii worm, which 


grows in hydrothermal vents, can not withstand tem¬ 


peratures greater than 55°C for 2 hours (Ravaux et al. 


2013|l. Similarly, animals are less tolerant to desiccation 


|2010[|M'onari et al.|201 11. However, evidence for a link 


between polyploidy and salinity is unfortunately tenu¬ 
ous. Polyploid plants appear to have greater tolerance 
to salt stress than their diploid relatives ( Tal and Gardij 
|1976| [Shannon and Grieve|| 19991 [Ashraf et al.|200T] 


stresses than prokaryotes and fungi, with the notable 
exceptions of bdelloid rotifers and tardigrade^ ( jAlpertj 
|2006| l. The general trend appears to be that animals are 
less stress tolerant than fungi, which in turn are less 
stress tolerant than bacteria and archaea. Considering 
that the ability to produce HSPs appears to control the 


upper thermal tolerances of organisms (Portner 2001 


i-LT ' 1 innnrK — Ti -- 1 - ^ ^ — i i j G. F. Dillv et al. 12012b, the relative stress tolerances of 

Kumar et al. 12009b. Also, several species or polyploid l—:- ^ — : —■- j 

" ^ ‘animcilc fimfri Kcir*t<=rici cinrl mci\r r#al_ 


brine shrimp, Artemia, have been identified ( jBrownej 
and Bowen 1991[ Amat et al. 2007|l. Several papers 


have suggested that the radiation of polyploid Artemia 


is related to the Messinian salinity crisis (e.g. Agh et al. 


2007|l, but other papers have disputed this claim (Bax 


evanis et al. 20061. Various papers have found that 


parthenogenic Artemia tolerate both higher and lower 
salinities than sexuals, but none of these papers dis¬ 
tinguish between polyploid and diploid parthenogens 
([Browne and MacDonald|19^ Zhang and King|1993| 


El-Bermawi et al.||2004 " [Agh et al.||2007| l. The distri- 

bution of polyploid Artemia may instead be driven by 
latitude ( Zhang and Lefcort|1991| l. 

The geographical distribution of polyploid organ¬ 
isms suggests that they have an advantage in environ¬ 
ments that promote protein aggregation. These trends 
can be understood in terms of a truncation selection 
model. Truncation selection would prevent individu¬ 
als with low heterozygosities from migrating into harsh 
environments, thereby preventing some diploid species 
from expanding into harsh environments and adapting 
to them. In contrast, polyploid individuals may be par¬ 
tially pre-adapted to harsh environments because their 
heterozygosities are already sufficiently high to avoid 
truncation selection. Upon time, the polyploids will fur¬ 
ther adapt as favorable alleles increase in frequency. A 


animals, fungi, bacteria, and archaea may reflect the rel¬ 
ative levels of protein aggregation that they must cope 
with. 

The trend in stress tolerance seems to match a trend in 
the ploidy levels of these organisms. Bacterial and ar- 
chaeal species are haploid, animal species are diploid, 
and fungal species may either be haploid or diploid. 
Thus, bacterial and archaeal species may never have to 
cope with truncation selection for heterozygosity while 
all animal species may experience such truncation selec¬ 
tion, even in relatively mild conditions. Eungal species 
may or may not experience truncation selection for het¬ 
erozygosity depending on the environmental conditions 
or the stage in their life-cycle (see below). An underly¬ 
ing mechanism may generate the different levels of pro¬ 
tein aggregation in these organisms, which may explain 
their relative stress tolerances and their ploidy levels. 

But why do different species experience different 
levels of protein aggregation? One explanation is 
that some species produce more aggregation-prone pro¬ 
teins than others. [Olzscha et al.| P011| l found that 
aggregation-prone proteins in human cells tend to be 


^Polyploidy may explain these exceptions. Bdelloid rotifers are 
descended from a tetraploid ancestor, and most freshwater and terres¬ 
trial tardigrades are polyploid [Bertolani|2001|Hur et al.|2008| 
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large and tend to contain lURs, which are characteris¬ 
tics of multi-domain proteins ( Dunker et al.|2005||Fong| 
land Panchenko||2bl0[ ). These proteins often have nu¬ 
merous interaction partners and are often involved in 


action partners (jBalazs et al. 2009|l. lURs allow pro 


signal transduction and regulatory processes (Dunker 


et al. 200^ Warringe r and Blo mberg|200^ Uversky and 

Dunker|2010 Olzscha et al.|201l| l. Such proteins facil¬ 

itate organism complexity because they have important 
roles in coordinating and regulating the biochemical ac¬ 
tivities within and between cells, which is necessary for 


the development of complex organisms (Rubin et al. 


2000 Patthy 

2003[ Dunker et al. 

2005 n 

’an et al.|2005 

Tordai et al. 

2005 [ Ekman et a 

1. 2007 

Uversky and 


(2008 I found 


that the human protein interaction network (PIN) con¬ 
tains approximately 650,000 protein interactions while 
the S. cerevisiae PIN contains approximately 25,000- 
35,000 protein interactions. 

The proteins that facilitate complex PINs tend to be 
large and multi-domain because multiple domains are 
necessary to facilitate different interactionsJRubin^e^^ 


20001 |Tan et al.|20051 ITordai et al.|2005l |Ekman et al. 


2007 


et al. 

' 


Zmasek and Godzik 201 1||. For example, Xia 


action domains per protein increases with the number 


of cell types in an organism. Likewise, Wang et al. 


P005| l found that proteins shared by S. cerevisiae, D. 
melanogaster, and H. sapiens are similar in length; but 
proteins found in D. melanogaster and H. sapiens, but 
not in S. cerevisiae, are on average 22% longer than 
proteins shared by all three species. Finally, Warringer| 


and Blomberg (20061 found that S. cerevisiae proteins 


longer than 770 amino acids have more interaction part¬ 
ners, on average, than shorter proteins. Particularly 
abundant among these large proteins were transport pro¬ 
teins, proteases, kinases, and other signaling proteins. 
These proteins are responsible for signal transduction 


and regulating biochemical pathways (Sopory and Mun- 


shijl^O^ Manning et al.|2002) Lopez-Otln and Hunter 


2010) . In multicellular organisms, these proteins play 
vital roles in intercellular communication, regulation of 


the cell cycle, and cellular differentiation (LeMosy et al.| 
|200T1 |Schaller||2004l [1^1120061 |van der Hoorn||20081 


Keshet and Seger||2010|l. Thus, large proteins play an 


import role in coordinating and regulating the biochem¬ 
ical activities in organisms. 

Proteins containing lURs also facilitate complex 
PINs. lURs can serve as flexible inter-domain linkers 
that allow domains to move freely with respect to each 
other ( Tompa|2002[ Tompa|2005| l. These inter-domain 
linker regions are not merely structural, but often times 
serve as binding sites between proteins and their inter¬ 


teins to bind to multiple partners, or lURs can allow 


et al. 1996 

Tompa et al. |2005 Oldfield et al. 2008 

Tyagi et al. 

2009 Bustos|2012|l. lURs may also speed 


up binding reactions via the “fly-casting” mechanism 
( Shoemaker et al.||2000 l. Finally, lURs provide easily 
accessible sites for post-translational modifications to 
proteins, which makes them important for biochemical 
regulation ( jPunker et al. 2002 Kurotani et al. 2014) l. As 
a consequence of these structural properties, up to 94% 
of transcription factors and more than 70% of signaling 
proteins in eukaryotes contain lURs Pakoucheva et al. | 
2002 Liu et al.]|2006| |Uversky and Dunker||2010 1. In 
short, lURs are important mediators of protein interac¬ 
tions, and proteins containing lURs are responsible for 
most signal transduction and biochemical regulation in 
eukaryotes. 

Thus, it appears that the proteins that facilitate organ¬ 
ism complexity are also the proteins that make complex 
organisms more sensitive to environmental stresses. 
This can explain why higher ploidy levels are associated 
with complex organisms. Bacteria and archaea never 
experience truncation selection for heterozygosity, even 
in harsh environments. In contrast, animals must be 
diploid, even in mild environments, because they pro¬ 
duce an abundance of proteins with numerous interac¬ 
tion partners. These proteins are typically large and 
typically contain lURs, so they tend to be aggregation- 
prone. As a consequence, animals must be heterozy¬ 
gous in order to be on the right side of the truncation 
curve (Figure |^, and that requires them to be at least 
diploid. Organisms of intermediate complexity, such 
as plants and fungi, may be either haploid or diploid, 
depending on the species. Whether such species are 
haploid or diploid will depend on the abundance of 
aggregation-prone proteins that they produce and on the 
stresses to which their proteins are exposed. If this hy¬ 
pothesis is true, then the diploidy of complex organisms 
is largely necessitated by the physical constraints im¬ 
posed by PIN complexity. It may be the case that the 
thermodynamic benefits of heterozygosity (Subsection 
|2.3|l increase with an organism’s complexity. 


4.3. Developmental Homeostasis 

The hypothesis presented in the previous subsec¬ 
tion shares several connections with the pioneering 
work from early metabolic heterosis theorists, such as 
I. Michael Lerner. |Lerner| ( |1954| l found that inbred 
plants and animals exhibited more morphological vari¬ 
ability than outbreds, which Lerner attributed to a de¬ 
cline in developmental homeostasis (or developmental 
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stability). In other words, less heterozygous individuals 
display greater degrees of aberrant growth and devel¬ 
opment, which result in morphological imperfections 
such as bilateral asymmetry. Over the years, other re¬ 
searchers have corroborated Lerner’s findings (jRobert- 
|son and Reeve||1952] |Eanes|[T978l |Mitton|[r97^ Soule 
|1979[ |Mitton|[T995| l, which in turn lead to the studies 
that found heterozygosity correlates with growth rate. 


metabolic efficiency, and protein turnover ( 

Singh and 

Zouros|1978 Zouros et al.|1980 

Koehn and Shumway 


Hawkins et al.||198^ [Hawkins et al.||198^ . Thus, the 


field of metabolic heterosis studies can trace its roots 
back to Lemer’s work on developmental homeostasis. 

The hypothesis presented in this subsection can ex¬ 
plain how low heterozygosity would disrupt develop¬ 
mental homeostasis. The proteins that regulate proper 
growth and development are also the proteins that are 
most susceptible to aggregation. Both environmental 
stress and inbreeding can lead to elevated rates of ag¬ 
gregation for signaling and regulatory proteins, which 
could potentially disrupt intercellular communication, 
regulation of the cell cycle, proper cell differentiation, 
etc. The cumulative result would be a disruption in 
developmental homeostasis as described by I. Michael 
Lerner. Thus, protein aggregation can potentially ex¬ 
plain much of the phenomena that concerned early re¬ 
searchers working with allozymes (see Mitton|1997 for 
review.) 

4.4. Plants and Alternation of Generations 

The trends described in the previous subsection hold 
for plant species as well. Plants typically alternate be¬ 
tween a haploid gametophyte generation and a diploid 
sporophyte generation. However, the dominant genera¬ 
tion varies between divisions. For example, bryophytes 
have a dominant gametophyte generation and a short¬ 
lived sporophyte generation while spermatophytes are 
primarily diploid (the sporophyte generation is domi¬ 
nant). In addition, ferns have independent gametophyte 
and sporophyte generations. The theory presented in 


Subsection 4.2 predicts that fewer aggregation-prone 
proteins should be produced by the haploid gameto- 
phytes than in the diploid sporophytes. Two types of 
circumstantial evidence support this prediction. 

First, there is a relationship between complexity and 
ploidy level. The haploid-dominant bryophytes are rel¬ 
atively simple plants, typically 2 cm tall and one cell 
thick. The diploid-dominant spermatophytes are com¬ 
plex and include flowering plants. The ferns alternate 
between a haploid generation that is simple, resem¬ 
bling bryophytes, and a diploid generation that is sig¬ 


nificantly larger and more complex. Thus, according 
to the hypothesis presented in the previous subsection, 
spermatophytes and fern sporophytes should have more 
complex PINs and should synthesize more aggregation- 
prone proteins than bryophytes and fern gametophytes. 

The second type of evidence is the relative stress tol¬ 
erances of the different plant divisions. The relation¬ 
ship between ploidy level and stress tolerance in plants 
is similar to the trend described in the previous subsec¬ 
tion for animals, fungi, and prokaryotes. Bryophytes are 
much more tolerant of freezing, desiccation, and salinity 
stresses than spermatophytes (|Alpert|2000| [Oliver et al. 
[2005) [Wang et al.[[2009| [Gaff and Oliver[|2013| l. Their 
ability to tolerate such stresses is comparable to lichens, 
and they can be found, along with lichens, in extremely 
cold and arid environments not inhabited by more com¬ 
plex plants (Fongton 1988) Alpert 2006) Proctor and 
[Tuba|[2002| [ Cranner et al.[[2008| l. The desiccation tol¬ 
erance of ferns is more complex. Fern sporophytes are 
comparable to spermatophytes in their ability to toler¬ 
ate desiccation, but fern gametophytes are comparable 
to bryophytes ( [Watkins, Jr. et d^[2007[ Hietz[[20T0) l. 
In fact, some tropical fern species have lost the sporo¬ 
phyte stage of their life cycle and now exist as asexu- 
ally reproducing gametophytes, which allows them to 
live in colder and drier habitats than their sporophyte- 
producing relatives (Farrar 1978[ Farrar|[T9^ . Thus, 
bryophytes and fern gametophytes probably express 
fewer aggregation-prone proteins than spermatophytes 
and fern sporophytes. This would explain the relative 
order of plant stress tolerances: bryophyte « fern game¬ 
tophyte > spermatophyte a; fern sporophyte. The trend 
corresponds to the relative complexity of the plant divi¬ 
sions and to their ploidy levels. 

The above observations on relative stress tolerances 
should not be taken to imply that bryophytes are never 
subjected to truncation selection for heterozygosity. For 
example, allodiploid species of bryophytes (hybrids 
with two sets of chromosomes) have been identified, 
and they increase in frequency with latitude ( [Wyatt et ah] 
1988[ Ricca et al. 2008[). Bryophyte species proba¬ 


bly produce few enough aggregation-prone proteins that 
they do not experience truncation selection for heterozy¬ 
gosity in mild environments, but they might produce 
enough aggregation-prone proteins that they might ex¬ 
perience truncation selection in harsher environments. 
Thus, bryophytes, ferns, and spermatophytes all ex¬ 
hibit ploidy level increases in harsh environments. The 
difference between bryophytes and spermatophytes is 
that bryophytes have a haploid chromosome set base¬ 
line whereas spermatophytes have a diploid chromo¬ 
some set baseline. Also, the sporophyte generations of 
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bryophytes are diploid and cannot survive in as harsh 
of environments as the gametophyte generations ( |Stark 
|et al.|2007) l. This may indicate that the sporophyte gen¬ 
eration of bryophytes express aggregation-prone pro¬ 
teins that are not produced by the gametophyte gener¬ 
ation. 

When comparing the life-cycles of green algae, 
bryophytes, ferns, and spermatophytes, there appears 
to be a progression from haploid dominant species, to 
species that alternate between haploid and diploid gen¬ 
erations, to diploid dominant species. In other words, 
diploid dominant species did not evolve directly from 
haploid dominant species, but instead, from species 
that alternated between haploid and diploid generations. 
The truncation selection model developed in this paper 
can explain this evolutionary sequence. Recall that a 
species will go extinct if all individuals are on the wrong 
side of a truncation curve, and that diploid species pro¬ 
duce aggregation-prone proteins that contain multiple 
domains and lURs. These two statements imply that 
truncation selection would inhibit the evolution of hap¬ 
loid species that produce an abundance of aggregation- 
prone proteins, so the need for diploidy would never 
arise. Diploid individuals are unlikely to successfully 
compete against haploid individuals if there is no im¬ 
mediate advantage to diploidy, especially if there are 
advantages to haploidy as described in the next subsec¬ 
tion. 

However, a spore-producing generation can circum¬ 
vent this barrier by providing an immediate advantage 
to diploidy. Sexual reproduction requires haploid domi¬ 
nant species to possess at least a temporary diploid gen¬ 
eration. For such organisms, sexual reproduction is of¬ 
ten accompanied by the production of spores or cysts. 
This may facilitate inbreeding avoidance since spores 
often serve as a means of dispersal, or a mixed cyto¬ 
plasm may increase the ability of a spore or cyst to toler¬ 
ate environmental stresses (Equation[T6|. A mixed cyto¬ 
plasm may also increase the longevity of a spore or cyst 
since it would slow down the accumulation of soluble 
oligomers in the dormant organism over time. For fungi, 
there is some evidence that se xual spores are more stress 
tolerant than ase xual spore^ (Grishkan et al.|200^ 


Di- 


jksterhui^|2007[ |Trapero-C^as and Kaiser|2007| l. Re¬ 
gardless, sexual reproduction requires organisms to pos¬ 
sess at least a diploid zygote in their life-cycle, and 
this often accompanies spore production. Some species 
have evolved a separate diploid (or dikaryotic) spore- 


producing generation, such as sporophytes in plants or 
ascocarps and basidiocarps in dikaryotic fungi, because 
it aids with spore dispersal. This spore-producing gen¬ 
eration has the potential to evolve complexity over time 
because truncation selection would not prevent it from 
expressing genes that encode the aggregation-prone 
proteins that facilitate complexity. In contrast, trun¬ 
cation selection would prevent strictly haploid spore- 
producing species from becoming more complex over 
time. 

Given the constraints that truncation selection would 
impose on haploid organisms, the evolution of diploid- 
dominant organisms may have proceeded along the fol¬ 
lowing steps: 1) Bacterial and archaeal species produce 
proteins that are not susceptible to aggregation, so they 
do not have a heterozygous diploid stage in their life- 
cycle, even when producing spores; 2) some eukaryotic 
species (e.g. some algal and fungal species) produce 
enough aggregation-prone proteins that their spores 
benefit from a mixed cytoplasm, which requires at least 
a temporary diploid stage in their life-cycl^ 3) Some 
species have longer-lived diploid, spore-producing gen¬ 
erations in their life-cycles, perhaps because they facil¬ 
itate spore dispersal (e.g. bryophytes) 4) the diploid 
stages of some species’ life-cycles have become more 
complex over time because truncation selection has not 
prevented them from producing aggregation-prone pro¬ 
teins with numerous interaction partners (e.g. ferns, 
dikaryotic fungi); and 5) the haploid life-cycle stage has 
become temporary in some multicellular species, which 
has resulted in diploid-dominant species (e.g. animals 
and spermatophytes). Thus, the evolution of diploid- 
dominant species would be a very gradual process, un¬ 
like the evolution of polyploidy, which occurs in a single 
generation. 

The freshwater green algae Charales may provide 
support for the above evolutionary sequence. Charales 
are exclusively haploid, lacking a sporophyte stage in 
their life-cycle ( Becker and Marin|2009| l. They are also 
relatively complex compared to other green algae, but 
are still much simpler than angiosperms ( |Lee 2008|l. 
However, Graham and Gray (2001 1 argues that Char¬ 
ales “are not competitive with freshwater angiosperms 
and rarely share freshwater habitats with them.” In fact, 
the fossil record shows that Charales’s species diver¬ 
sity has declined over time since the appearance of an¬ 
giosperms, and that Charales may be an evolutionary 
dead end (Graham and Gray|2001|l. Given that Charales 


'^The spores in these studies were haploid, but they could have in¬ 
herited a mixed cytoplasm from their diploid parent cells during meio- 


^This could have evolved when cells from closely related species 
fused together to form a diploid cell, which would be analogous to the 
formation of polyploid species via hybridization 
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species are closely related to embryophytes, and that 
complexity seems to have given angiosperms a com¬ 
petitive advantage over Charales species, it is natural 
to ask why Charales species did not become more com¬ 
plex over time. Competition between Charales individ¬ 
uals should have lead to an increase in complexity over 
time, just as competition favored the evolution of com¬ 
plex angiosperms. 

The absence of a sporophyte stage in the Charales 
life-cycle may explain their relatively simple morphol¬ 
ogy since, as argued in the previous four paragraphs, 
truncation selection would impose an upper limit on 
the abundance of aggregation-prone proteins that the 
haploid Charales species can produce. This in turn 
would impose an upper limit on the complexity of Char¬ 
ales species because proteins with numerous interaction 
partners, which are required in complex organisms, tend 
to be aggregation-prone. Thus, Charales species may 
have hit an evolutionary dead end because they do not 
possess a sporophyte generation in their life-cycle. In 
contrast, an upward-growing sporophyte generation is 
beneficial to embryophytes because it facilitates spore 
dispersal through the air. The possession of a diploid 
sporophyte generation may have removed a barrier to 
the evolution of complexity in some embryophyte lin¬ 
eages, which has allowed the spermatophytes to become 
large, complex organisms. 


4.5. The Advantages of Haploidy 

The theory developed in Subsection |4.2| attempts to 
explain why diploidy is advantageous. However, the 
occurrence of organisms that alternate between hap¬ 
loid and diploid life-cycles stages (e.g. ferns and Ulva) 
suggests that haploidy has advantages. Such organ¬ 
isms could stay permanently diploid (with brief haploid 
stages for the purpose of sexual reproduction) if there 
was no advantage to haploidy. Two such potential ad¬ 
vantages are genetic load and growth rate. 

Genetic load should favor the evolution of lower 
ploidy levels ( |Mable and Otto|1998| l. The genetic load 
in a population is directly proportional to the mutation 
rate ( |Haldane||1937| l. Thus, if mutation rates are rela¬ 
tively constant at all ploidy levels, then a population of 
diploid individuals should have twice the genetic load 


of a population of haploid individuals (Otto and Whit- 


ton|2000| [Gerstein and Otto|2009| l. As a consequence. 


populations of haploid individuals should have higher 
mean fitnesses than populations of otherwise identical 
diploid individuals ( Mable and Otto|1998| l. 

Higher ploidy levels are also disadvantageous be¬ 
cause they lead to slower growth rates. This has been 
observed in polyploid plants, which typically grow and 


mature more slowly than their diploid relatives (|Otto 


and Whitton||2000] Hessen et al.||200^ . Also, diploid 


gametophyte lines of one bryophyte species grow ss70% 


as fast as haploid lines on full medium (Schween et al. 


2005|l. Thus, haploidy might be beneficial to organisms 


that face strong selection for high growth rates. Hap¬ 
loidy would also be particularly beneficial to single- 
celled organisms, in which cell division rates are di¬ 
rectly tied to fecundity. 

The growth rate hypothesis is particularly useful 
when trying to understand the life cycles of plants. All 


plants must compete for limited space (Gurevitch et al. 


1990[ Gremer et al.|2013 1, so a faster growth rate would 


give haploid plants an advantage over diploid plants 
when attempting to compete for access to land. For in¬ 
stance, bryophytes reproduce asexually via fragmenta¬ 
tion and sexually via spore production. In both cases, 
the gametophyte plants must quickly grow from only a 
few cells in order to establish themselves in a partition 
of land. Ferns also disperse themselves via spores made 
up of only a few cells. Thus, their simple haploid ga¬ 
metophyte generations may be beneficial because they 
can quickly establish themselves in a partition of land. 
Then, the ferns reproduce sexually and produce their 
complex, diploid sporophyte generations, which don’t 
have to compete for access to land because they grow 
out of their gametophyte parents. In contrast, spermato¬ 
phytes disperse themselves via seeds that carry entire 
diploid plant embryos. The plant embryos can quickly 
establish themselves in a partition of land, despite a 
slower growth rate, because they are already partially 
developed prior to germination. This might allow sper¬ 
matophytes to be diploid, complex organisms for the 
bulk of their life-cycle, which in turn might allow them 
to utilize complex structures (e.g. flowers) for all of 
their life processes, including sexual reproduction. 


5. Conclusion 

This paper attempts to provide a biochemical ba¬ 
sis for heterosis and selection for heterozygosity. It 
then shows how individuals with higher heterozygosi¬ 
ties are favored with increasing environmental harsh¬ 
ness and organism complexity, which will also favor 
higher ploidy levels. In addition, the hypothesis was 
used to explain the different life-cycles of plants and the 
results of numerous experiments that have found het¬ 
erozygosity correlates with reduced protein turnovers 
and higher metabolic efficiencies. Thus, the hypothe¬ 
sis can explain numerous field observations and exper¬ 
imental data in which heterozygosity and ploidy level 
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are variables. Future research should be able to estab¬ 
lish whether heterozygous advantage has a thermody¬ 
namic basis, and whether organism complexity and en¬ 
vironmental harshness are in fact determinants of each 
species’s ploidy level. 

The hypotheses developed in this paper have impor¬ 
tant implications for breeding more stress tolerant va¬ 
rieties of crops. Several algorithms have been devel¬ 
oped for identifying aggregation-prone amino acid se¬ 
quences in proteins (Tartag lia et al.|2008 Goldschmidt] 
et al.||2010]l. These algorithms can be used to identify 


the proteins that are susceptible to aggregation, which 
in turn may identify gene loci where heterozygosity is 
most beneficial. Such identification may prove help¬ 
ful in developing crop varieties that can withstand the 
physical stresses caused by global climate change 
bell et al!]|2013 1. There is already some evidence that 


global climate change is leading to selection for het¬ 
erozygosity in animal populations (Forcada and Hoff- 


man|2014]l. Identifying aggregation-prone proteins may 


also be helpful in further improving crop yields as has 
occurred throughout the 20'* century. 
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